现有的使用变压器模型生成多功能音乐的方法仅限于一小部分乐器或简短的音乐片段。这部分是由于MultiTrack Music的现有表示形式所需的冗长输入序列的内存要求。在这项工作中,我们提出了一个紧凑的表示,该表示可以允许多种仪器,同时保持短序列长度。使用我们提出的表示形式,我们介绍了MultiTrack Music Transformer(MTMT),用于学习多领音乐中的长期依赖性。在主观的听力测试中,我们提出的模型针对两个基线模型实现了无条件生成的竞争质量。我们还表明,我们提出的模型可以生成样品,这些样品的长度是基线模型产生的样品,此外,可以在推理时间的一半中进行样本。此外,我们提出了一项新的措施,以分析音乐自我展示,并表明训练有素的模型学会更少注意与当前音符形成不和谐间隔的注释,但更多地却更多地掌握了与当前相距4N节奏的音符。最后,我们的发现为未来的工作提供了一个新颖的基础,探索了更长形式的多音阶音乐生成并改善音乐的自我吸引力。所有源代码和音频样本均可在https://salu133445.github.io/mtmt/上找到。
translated by 谷歌翻译
将音频分离成不同声音源的深度学习技术面临着几种挑战。标准架构需要培训不同类型的音频源的独立型号。虽然一些通用分离器采用单个模型来靶向多个来源,但它们难以推广到看不见的来源。在本文中,我们提出了一个三个组件的管道,可以从大型但弱标记的数据集:audioset训练通用音频源分离器。首先,我们提出了一种用于处理弱标记训练数据的变压器的声音事件检测系统。其次,我们设计了一种基于查询的音频分离模型,利用此数据进行模型培训。第三,我们设计一个潜在的嵌入处理器来编码指定用于分离的音频目标的查询,允许零拍摄的概括。我们的方法使用单一模型进行多种声音类型的源分离,并仅依赖于跨标记的培训数据。此外,所提出的音频分离器可用于零拍摄设置,学习以分离从未在培训中看到的音频源。为了评估分离性能,我们在侦察中测试我们的模型,同时在不相交的augioset上培训。我们通过对从训练中保持的音频源类型进行另一个实验,进一步通过对训练进行了另一个实验来验证零射性能。该模型在两种情况下实现了对当前监督模型的相当的源 - 失真率(SDR)性能。
translated by 谷歌翻译
音乐件在层次的情况下,从声波事件到旋律,并且顺序地,以重复和变化的形式。来自不同文化的音乐通过对这两个方面具有不同的风格惯例来建立不同的美学。我们提出了一种框架,可以通过查看这两个方面来定量地比较不同文化的音乐。该框架基于音乐信息动态模型,变量马尔可夫Oracle(VMO),并与音频的变分学习进行扩展。培训变形AutoEncoder(VAE)以将音频碎片映射到潜在表示。潜在的表示将被送入VMO。然后,VMO通过最大化量化潜在表示序列的信息速率来学习潜在表示的聚类。此阈值有效地控制了预测步骤对声学更改的感觉,这决定了框架跟踪重复在更长的时间尺度上的能力。该方法允许在每个声学敏感度下表征音乐信号的整体信息内容。我们在本框架下的调查结果表明,对东亚音乐传统的微妙声学变化的敏感性较高,而西方作品在潜在空间的差异较高的阈值下表现出较长的动力结构。这表明作为声学细节水平的函数分析的信息内容的简档可以作为可能的文化特性。
translated by 谷歌翻译
Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search, which seeks the transcript with the greatest likelihood computed using the predicted distribution. While showing substantial performance gains in various tasks, beam search loses some of its effectiveness when the predicted probabilities are highly confident, i.e., the predicted distribution is massed for a single or very few classes. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions that may hamper beam search from truly considering a diverse set of candidates. We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. Our proposed approach does not require further training beyond the original fine-tuning, nor additional model parameters. In fact, we find that our proposed method requires significantly less inference computation than current approaches. We propose aggregating the top M layers, potentially leveraging useful information encoded in intermediate layers, and relaxing model confidence. We demonstrate the effectiveness of our approach by conducting an empirical study on varying amounts of labeled resources and different model sizes, showing consistent improvements in particular when applied to low-resource scenarios.
translated by 谷歌翻译
When used in complex engineered systems, such as communication networks, artificial intelligence (AI) models should be not only as accurate as possible, but also well calibrated. A well-calibrated AI model is one that can reliably quantify the uncertainty of its decisions, assigning high confidence levels to decisions that are likely to be correct and low confidence levels to decisions that are likely to be erroneous. This paper investigates the application of conformal prediction as a general framework to obtain AI models that produce decisions with formal calibration guarantees. Conformal prediction transforms probabilistic predictors into set predictors that are guaranteed to contain the correct answer with a probability chosen by the designer. Such formal calibration guarantees hold irrespective of the true, unknown, distribution underlying the generation of the variables of interest, and can be defined in terms of ensemble or time-averaged probabilities. In this paper, conformal prediction is applied for the first time to the design of AI for communication systems in conjunction to both frequentist and Bayesian learning, focusing on demodulation, modulation classification, and channel prediction.
translated by 谷歌翻译
Artificial Intelligence (AI) systems have been increasingly used to make decision-making processes faster, more accurate, and more efficient. However, such systems are also at constant risk of being attacked. While the majority of attacks targeting AI-based applications aim to manipulate classifiers or training data and alter the output of an AI model, recently proposed Sponge Attacks against AI models aim to impede the classifier's execution by consuming substantial resources. In this work, we propose \textit{Dual Denial of Decision (DDoD) attacks against collaborative Human-AI teams}. We discuss how such attacks aim to deplete \textit{both computational and human} resources, and significantly impair decision-making capabilities. We describe DDoD on human and computational resources and present potential risk scenarios in a series of exemplary domains.
translated by 谷歌翻译
基于模型的强化学习有望通过学习环境中的中间模型来预测未来的相互作用,从而从与环境的互动较少的相互作用中学习最佳政策。当预测一系列相互作用时,限制预测范围的推出长度是关键的超参数,因为预测的准确性会降低远离真实体验的区域。结果,从长远来看,从长远来看,总体上更糟糕的政策。因此,超参数提供了质量和效率之间的权衡。在这项工作中,我们将调整推出长度调整为元级的顺序决策问题的问题构成了问题,该问题优化了基于模型的强化学习所学到的最终策略,鉴于环境相互作用的固定预算通过基于反馈动态调整超参数来调整超参数。从学习过程中,例如模型的准确性和互动的其余预算。我们使用无模型的深度强化学习来解决元级决策问题,并证明我们的方法在两个众所周知的强化学习环境上优于共同的启发式基准。
translated by 谷歌翻译
第二版深入学习访谈是来自AI的各种关键主题的数百个完全解决的问题。它旨在进行排练访谈或考试特定主题,并提供机器学习M.Sc./Ph.D。学生以及等待采访的人综合概述了该领域。它造成的问题足以削减牙齿并大大提高您的技能 - 但它们在思想引发问题和吸引故事中被诬陷。这就是对学生和求职者特别有价值的量:它为他们提供了快速和快速地对任何相关主题发表的能力,以清楚,正确地回答技术问题,并充分了解面试问题的目的和意义和答案。在走进访谈室时,这些是强大的,不可或缺的优势。本书的内容是与DL职位访谈和研究生级考试相关的大量主题的大量资料。这将这项工作以科学越来越多的趋势的最前沿,以教导一套实用的数学和计算技能。众所周知,每个计算机科学家的培训必须包括ML的基本定理,并且AI出现在几乎所有大学的课程中。该卷设计为此类计划的毕业生的优异参考。
translated by 谷歌翻译
生成对抗性示例是创建添加到分类神经网络的输入信号的噪声的领域,从而改变网络的分类,同时保持噪声尽可能脆弱。虽然该主题在2D制度中得到了很好的研究,但它在3D制度中滞后,即攻击适用于3D点云或网格的分类网络,例如,对人们的3D扫描的姿势进行分类。截至目前,绝大多数论文都通过优化方法描述了这一制度的对抗性攻击。在本技术报告中,我们建议一个产生攻击的神经网络。该网络利用PointNet的体系结构进行一些更改。虽然我们基于我们的工作的前一篇文章必须单独优化每个形状,但没有任何学习的每个单独输入从头定制攻击,我们试图创建一个统一的模型,可以用一个向前推断所需的对抗性示例跑。
translated by 谷歌翻译
Two of the main principles underlying the life cycle of an artificial intelligence (AI) module in communication networks are adaptation and monitoring. Adaptation refers to the need to adjust the operation of an AI module depending on the current conditions; while monitoring requires measures of the reliability of an AI module's decisions. Classical frequentist learning methods for the design of AI modules fall short on both counts of adaptation and monitoring, catering to one-off training and providing overconfident decisions. This paper proposes a solution to address both challenges by integrating meta-learning with Bayesian learning. As a specific use case, the problems of demodulation and equalization over a fading channel based on the availability of few pilots are studied. Meta-learning processes pilot information from multiple frames in order to extract useful shared properties of effective demodulators across frames. The resulting trained demodulators are demonstrated, via experiments, to offer better calibrated soft decisions, at the computational cost of running an ensemble of networks at run time. The capacity to quantify uncertainty in the model parameter space is further leveraged by extending Bayesian meta-learning to an active setting. In it, the designer can select in a sequential fashion channel conditions under which to generate data for meta-learning from a channel simulator. Bayesian active meta-learning is seen in experiments to significantly reduce the number of frames required to obtain efficient adaptation procedure for new frames.
translated by 谷歌翻译